Dependent Dirichlet Process Rating Model (DDP-RM) 



Ken Akira Fujimoto 
; and 
q . George Karabatsos 

D \ University of Illinois-Chicago 

Q 

o 

(N 

12/20/2012 



O 

in 

(N 



Acknowledgements: This research is supported by NSF research grant SES- 1156372, from 
the Program in Methodology, Measurement, and Statistics. This paper will be presented at 
the National Council for Measurement in Education (NCME) Conference, April 26-30, at 
San Francisco. Also, we thank Professor Stephen G. Walker, University of Kent, for helpful 
conversations about the MCMC algorithm we used for the paper. 



1 



Dependent Dirichlet Process Rating Model (DDP-RM) 

Abstract 

Rating scale items are ubiquitous in psychometric practice. Yet, the psychometric prop- 
erties of the rating scales can often vary by examinee, as well as by item. To address this 
practical psychometric problem, we introduce a novel, Bayesian nonparametric IRT model 
for rating scale items. The model is an infinite-mixture of Rasch partial credit models, with 
rating thresholds being the random parameters that are subject to the mixture, and with 
(infinitely-many) covariate-dependent stick-breaking weights. Random parameters and the 
mixture weights are assigned a Dependent Dirichlet process prior (DDP) distribution. Thus, 
the novel model allows the rating category thresholds to vary flexibly across items and exam- 
inees, and allows the distribution of the category thresholds to vary flexibly as a function of 
covariates. We illustrate the novel model through the analysis of a real rating data set that 
has been studied extensively in the psychometric modeling literature. The model is shown 
to have better predictive performance than other IRT rating models of common usage. 

KEYWORDS: Rating Scale Analysis, Bayesian Nonparametrics, Bayesian Inference 
RUNNING TITLE: Dependent Dirichlet Process Rating Model. 



Acknowledgements: This research is supported by NSF research grant SES-XXXXXXX, from 
the Program in Methodology, Measurement, and Statistics. 



2 



1 Introduction 



Item response theory (IRT) provide an modeling approach for estimating the psychometric 
properties of data gathered with tests consisting of rating scale items. An IRT rating scale 
model provides useful information about the qualities of the given test, such as the difficulty 
of each test item, and the response thresholds of the rating categories. Typical IRT rating 
models of common usage include the Rasch rating scale model (Andrich, 1978), partial credit 
models (Masters, 1982; Muraki, 1992), and the family of graded response models (Samejima, 
1969, 1972). All of these IRT models have seen many successful applications for a wide range 
of research settings. 

Still, there is room to further develop IRT rating models. Typical IRT rating models 
assume that the rating category threshold parameters do not vary across examinees. How- 
ever, the assumption is questionable, as it is reasonable to view that rating scale usage also 
varies across examinees. In summary, if the assumption is empirically violated, then the IRT 
rating model may poorly fit the data. Moreover, in practice, it may also be of interest to 
investigate how rating scale usage (thresholds) vary across examinees, in order to investigate 
for differential item functioning (DIF). 

In principle, one may relax this assumption through the specification of a discrete-mixture 
IRT model, with the rating threshold parameters being the random parameters that are 
subject to the mixture. In general, a discrete mixture model has the form (e.g., McLachlan 
& Peel, 2000): 

H 

/ Gx (y|x) = / /(y|x; ¥(x))dG x (¥) = £ /(y|x; * fc (x)H(x), 

•* h=l 

given, possibly covariate (x) dependent, mixing distribution G x , component indices h = 
1,...,H, kernel (component) densities /(y|x; ^(x)) (h = 1,...,H), and mixing weights 
(u)h(x))h =1 which sum to 1 at every at every x e X. A mixture IRT rating model treats 
y G {k — 0, 1, . . . , m} as a rating, and each of the kernel densities /(y|x; ^(x)) = f(y; 9, T/J 
(h — \,...,H) are chosen as an IRT model, such as the Rasch partial credit model (PCM): 

l(0,T h ) = p(Y = y \e,T h) - *»<#-n«T*) 



where (roh, • • . ,r m /,) are the rating category threshold parameters corresponding to the hth 
mixture component. Also, in typical IRT mixture models, the examinee ability parameter 9 
is assigned a normal prior distribution. However, none of the available mixture IRT rating 
models model the rating category thresholds as parameters that are subject to the mixture 
(Rost, 1991; Smit, Kelderman, & van der Flier, 2003; Von Davier & Yamamoto, 2004; Frick, 
Strobl, Leisch, & Zeileis, 2012). Moreover, most of these models are finite mixtures (i.e., 
H < oo), which have limited flexibility to adequately describe many rating scale data sets. 
We could achieve maximum modeling flexibility in a fully nonparametric framework, through 
the specification of an infinite-mixture model (i.e., H = oo). 

To address all these practical limitations of existing IRT models, we introduce a novel, 
Bayesian nonparametric IRT rating model. The model is an infinite-mixture of Rasch par- 



3 



tial credit models, with rating category threshold parameters subject to the mixture, and 
with infinitely-many covariate-dependent stick-breaking weights. Specifically, the random 
parameters and the mixture weights are modeled by a Dependent Dirichlet process (DDP) 
(MacEachern, 1999; 2000; 2001), namely the local Dirichlet process (1DP) (Chung & Dunson, 
2011). Therefore, we refer to our novel model as the DDP Rating Model (DDP-RM). 

Our model adds to the body of literature that incorporate Dirichlet process (DP) priors 
for IRT, such as DP mixtures of 3-parameter logistic model item parameters (Miyazaki & 
Hoshino, 2009), DP mixtures of Rasch model ability parameters (San Martin et al., 2011), 
and a DDP model for the link function of the 2-parameter IRT model (Duncan & MacEach- 
ern, 2008). However, all of these models focus on dichotomous item scores, and nearly all of 
these models assume a less-flexible DP, which assumes no covariate-dependence. Karabatsos 
and Walker (2012 to appear) provides a review of DP and DDP mixture models for IRT. 

In Section 2, we introduce our Bayesian nonparametric infinite-mixture IRT model for 
rating scale data, after giving a necessary review of the DP, the DDP, and the general DDP 
infinite-mixture model. The Appendix A describes the Markov Chain Monte Carlo (MCMC) 
algorithm that is used to estimate the posterior distribution of our model. In Section 3, we 
illustrate our model on a real data set of rating scale items, which has been extensively 
studied in the psychometric modeling literature (De Boeck & Wilson, 2004). There, we also 
compare the goodness of predictive fit of our model against other IRT rating scale models 
of common usage. Section 4 ends with conclusions, and with some discussion about future 
modeling extensions. 

2 Nonparametric Infinite-mixture IRT 

2.1 Dependent Dirichlet Process (DDP) 

Throughout, n p (-|-, •), ga(-|-, •), beta(-|-, •), and un(-|-, •) denote the density functions of the 
p-variate Normal p (-|-, •), Gamma(-|-, •), Beta(-|-, •), and Uniform(-|-, •) distribution functions, 
respectively, where the gamma distribution is parameterized by shape and scale parameters. 

We first review the DP, which is the basis of the DDP. Let G denote a random distribution. 
If this random distribution is a Dirichlet process (DP), it is denoted by G ~ DP(a,Go), 
with precision parameter a > 0, with baseline distribution Go, mean E[G(-)] = Go(')> an d 
variance Var[G(-)] = G (-)[l — G (-)]/(a + 1) (Ferguson, 1973). Also, a Dirichlet process 
G ~ DP (a, G ) random distribution can be constructed according to a stick-breaking process 
(Sethuraman, 1994), via: 

oo 

G(-)=x>va (i) 

h=l 

according to stick-breaking weights defined by: 

h-l 

Uh = v h JJ(l-tv), h = l,2, ... 

r=l 



4 



which sum to 1 almost surely, given random variates 

v 1 , v 2 , ... ~i.i.d. Beta(l, a); 9 ± , 6 2 , ... ~u.d. G , 

where Sg h (•) denotes a point-mass distribution function that assigns probability 1 to the 
value 9h- Hence, a random DP distribution, G ~ DP(a, Go), is formed by an infinite-mixture 
of such point-mass distributions, and is almost surely discrete. 

The DDP extends the DP to regression settings, by providing a model for a covariate- 
dependent random distribution G x (MacEachern, 1999, 2000, 2001). A DDP has the stick- 
breaking representation: 

oo 

Gx(0 = E" fc (x)* flfc(x) (.), (2) 
h=i 

where the stick-breaking weights u h (x) = v h (x) Ylr=i (1 — v r ( x )) (h — 1? 2, ...) sum to 1 at 
every covariate value x G X, with v r (x) : X — \ [0, 1]. The specification of a DDP model is 
completed by the specification of a prior distribution for the mixture weights {ut (x)} A=1 2 
and atoms {dh (x)} h=1 2 , which are infinite collections of processes indexed by the x-space. 
Such a prior distribution is called a DDP prior, and is a prior for the distribution G x . 

An example of a DDP prior is the local DP (1DP; Chung & Dunson, 2011). The 1DP 
makes use of a predictor-dependent set £ x = {h : d (x, Th) < ip} C {1,2,...}, which in- 
dexes the locations belonging to the neighborhood x of size ip > 0, and d(-, •) is a chosen 
distance measure (e.g., Euclidean). Then the 1DP models stick-breaking mixture weights 
{ojh (x)} /l=1 2 and atoms {6 h (x)} h=1 2 as locally covariate-dependent, through the spec- 
ification of local random components 

r (x) = {T h , he£J, v (x) = {v h , he£*}, 9 (x) = {6 h , he£J. 

Specifically, the 1DP constructs a random covariate-dependent mixing distribution: 

|£x| 

Cx(-) = Yl w h ( x ) <W(*) (0» 

on the basis of infinitely-many mixture weights that are defined by: 

uj h (x) = u Wh(x) n (1-^ttKx)), 

{Ze£ x :7r;( x )< 7r h( x )} 

where |£ x | is the cardinality of the set £ x , and the {vr;(x) : I e £ x } indicate the ordering of 
the indices / G £ x on the basis of x. The 1DP is completed by the specification of a prior 
distribution on {T (x) , v (x) , (x) : x G A"}, to define a DDP prior on G x . 
Using a DDP, it is possible to construct an infinite-mixture model via 

/oo 
/ (y|x; *(x)) dG x = (y|x; (x)) Wfe (x) , (3) 
h=i 

where the / (y|x; *^(x)) represent chosen kernel densities, and where the covariate-dependent 



5 



random mixing distribution G x is assigned a DDP prior, and the Uh (x) are covariate- 
dependent stick-breaking weights. Such a model is referred to as a DDP mixture model, 
extending the idea of a DP mixture model (Lo, 1984) which assumes that the mixture dis- 
tribution G is not covariate-dependent. 

2.2 The Dependent Dirichlet Process Rating Model (DDP-RM) 

The DDP-RM is a DDP mixture model defined by: 



/oo 
/ ( y \e, t) dG x ( T ) = J2 f ivK-rh) ( x t 7 ) 
h=l 

where the kernels / (y\9,Th) are specified by the Rasch partial credit model: 

exp (yG - Y^ =0 T lh) 



(4) 



f(y\e,r h ) = P(Y = y\e,r f 



Er=o ex p(^-Eto T lh 



where for the m + 1 rating categories k = 0, l,...,m, we assume the constraint roh = 0, 
with free threshold parameters Th = (tih, T m h), for the h th mixture component. Also, 
throughout, we denote the ability parameter of a given examinee t is by Of Also, the stick- 
breaking mixture weights {ujh (x)} h=1 2 and atoms {th (x)} /l=1 2 are modeled by an 1DP 
prior; more details later. Moreover, x can be a general vector of covariates, which may 
for example, either describe examinee characteristics (gender, race, and/or social economic 
status), or describe test characteristics (time at which item was administered, item type, 
etc.). For the next section, where we provide an empirical illustration of our model, we 
consider the case where x are item (0-1) indicators. 

The mixing distribution in our model is formed according to the following novel modifi- 
cation of the 1DP (Chung & Dunson, 2011). First let 

£ x = {h : d (x T 7 , h) < ^(x)} C {1, 2, . . .} 

denote the subset of mixture component indices h E Z + having fixed addresses {Th = h} 
which are within a -?/>(x)-neighborhood around the linear predictor x T 7. Then under our 
formulation of the 1DP, the local variables are defined by v (x T 7) = {vh, h G £ x } for the 
specification of stick-breaking mixture weights 

co h (x T 7 ) = v h (x T 7 ) [I (l-Mx T 7 )), he£*, (5) 

{leC^.Kq} 

and defined by rating category threshold atoms r (x T 7) = {r h , h G £ x }. We fix f max (£ x ) ( x '7) '- 
1 to ensure that the mixture weights Uh (x T 7) sum to 1 for each x (Chung & Dunson, 
2011). Thus, our 1DP forms stick-breaking mixture weights, by selecting the strict subset of 
stick-breaking parameters {{vh}) and atoms ({t^}) that are within neighborhood centered 
around (a linearized) x. For example, when x T 7 = 10 and ip(jc) = 2.5, then the covariate 
(x)-dependent local subset becomes £ x = {8, 9, 10, 11, 12}. 

Then the mixture weights of equation (jHJ) give rise to a covariate-dependent mixing dis- 



6 



tribution: 

|£x| 

G x(') = Yl Uh ( X ^) <W T 7) (') • 

hec x 

Thus, for two covariates x and x', the level of similarity between £ x and £ x ' determines the 
level of similarity between the two corresponding mixing distribution G x (-) and G x /(-), with 
the level of similarity controlled by the parameters (7, ifj(x)). 

The DDP-RM is completed by the specification of the following prior distributions: 

6 t ~ Normal (0,1), 

T h ~ Normal m ._i(0,S r ), 

Vh ~ Beta (1,«), 

a ~ Gamma (a a , b a ) , 

7 ~ 

^(X) ~ 7T X . 

where 7r 7 and 7r x are generic prior densities. In the next section, where we illustrate our 
model, we consider a useful choice of the prior distributions listed above. 

The unique features of our model is that it clusters item category thresholds based on 
similar mixing distribution, which is captured through the neighborhood inducing parameter 
7. When two separate 7s have the same values, then the mixture components are the same 
for the covariates associates with the two 7s. In the present study, because the covariates 
are item indicators, similar 7s would suggest that the items associated with the 7s have 
similar mixing distributions describing the random relative category thresholds, thus possibly 
suggesting that a common set of thresholds could be specified for this group of items. Another 
unique feature of our model is that it forms the mixing distribution nonparametrically and 
allows the mixing distribution to depend on covariates. 

For notational convenience, denote a sample set of rating data by V n = {(xj, yj)}"r 1 JVJ , 
provided by N examinees (t — 1, . . . , N) on J test items (j = 1, . . . , J), and with n = NJ 
giving the total number of item responses in the data set. Each yi e T> n denotes a rating by 
a particular examinee on a particular item. According to standard arguments of probability 
theory involving Bayes' theorem, given a data set T> n having likelihood YYi=i fiUil^u under 
our model with parameters £ = (0, r, v, a, 7, 1/?), and given a proper prior density n(C) 
defined over the space of £, the posterior density of C is proper and given by: 

tt(c|p„) ex TT n p( yi \^;CMO 

up to a proportionality constant. Then the posterior predictive density of Y for a chosen x 
is given by: 

/ n (y|x) = J /(y|x; CMC|Z>n)dC 

with this density corresponding to posterior predictive mean (expectation) and variance 
(Var) 

E n (y|x) = /y/ B (y|x)dy, Var n (F|x) = f{y — E(F|x)} 2 /„(y|x)dy. 



7 



We make use of the MCMC sampling methods for Bayesian infinite-mixture models that are 
described by Kalli, Griffin, and Walker (2011), to perform inference of the posterior density 
7r(£|P„) and posterior predictive density f n (y\x) of the model, and inference of all posterior 
functionals of interest. These methods are based on the use of strategic latent variables. 
As mentioned, the Appendix A provides more details about all the conditional posterior 
distribution of the model, which are sampled at each stage of the MCMC algorithm. 

2.3 Model Assessment of Predictive Performance 

Given a set of data V n , suppose it is of interest to compare the predictive performance 

between M different IRT rating models, with each model indexed by to = 1 M. It is 

possible to compare the predictive performance between the models using a mean-squared 
predictive error criterion, namely the Di(m) criterion (Gelfand & Ghosh, 1998). For a given 
model m G {1, M} under comparison, the criterion is defined by: 

n n 

Di(m) = ^ [yi - E n (Yi\xi,m)} 2 + ^ Var n (y i |x i , to) 

i=l i=l 

= GF(m) + Pen(m) 

In the equation above, the first term is a predictive bias measure which indicates the 
goodness-of-fit (GF(m)) of the model. The second term is a penalty which is large when the 
model is either over-fitting or under- fitting the given data set T> n . For all other comparison 
models, the E„(Yt|xj,m) and Var n (Yj|xj, to) are derived from marginal maximum or condi- 
tional maximum likelihood parameter estimates. For a non-Bayesian model having point 
estimate £ n = £(£>„), such as a maximum-likelihood estimation, the criterion is estimated 
via E n (Y i |x i ,m) = E(Y i \x i ,m,C n ) and Var^V^X;, to) = Var(Yj|xj, to, £J (i = l,...,n) 
(Gelfand & Ghosh, 1998). 

3 Model Illustration 

In this section, we compared the predictive performance of the DDP-RM to several other 
IRT rating models, on a real data set obtained from the verbal aggression study (De Boeck 
& Wilson, 2004). The verbal aggression data set contains the item ratings 316 students 
(243 females and 73 males) from a Dutch-speaking Belgian university. Each student rated 
24 items, which are indicators for levels of verbal aggression (e.g., "A bus fails to stop for 
me. I would want to curse."), on a scale of = no, 1 = perhaps, and 2 = yes. The items 
are categorized into a 2 x 2 x 3 design: Behavior Mode (Want or Do) by Situation Type 
(Other-to-blame or Self-to-blame) by Behavior Type (Curse, Scold, or Shout). Appendix B 
lists all 24 items. 

3.1 Model Specifications and MCMC Diagnostics 

When applying the DDP-RM to analyze the verbal aggression data set, we assigned a proper 
prior 6 t ~ iid n(0, 1), along with high- variance proper priors Th ~ud n (0, 5I m ), 7 ~ un(l, 745), 



8 



?/>(xj) un(.5,20), to reflect the limited prior information about these model parameters. 

We ran the MCMC sampling algorithm for 200,000 MCMC sampling iterations. We 
discarded the first 100,000 MCMC samples (i.e., burn-in period), and saved every fifth sample 
thereafter, saving a total of 20,000 posterior samples. We used standard methods to examine 
whether our MCMC algorithm (presented in the Appendix A) generated a sufficiently-large 
number of samples from the posterior distribution. Given a finite S number of samples 



the mixing of the chain (i.e. the extent to which the chain explores the support of the 
posterior distribution). Also, batch means methods can be used (Jones, 2006) to estimate 
the 95% MC confidence interval (MCCI), for estimates of marginal posterior moments. 

In Figures 1 and 2, we present the trace plots of the MCMC samples of the threshold 
estimates for three items and ability estimates for six examinees. The trace plots suggest 
that the estimates for all parameters stabilized after the burn-in period. The trace plots 
for all other parameter estimates were similar. The posterior means for all parameter also 
had sufficiently small 95% Monte Carlo (MC) confidence intervals according to a consistent 
batch means estimator. For each item, the 95% MC confidence interval half-width for the 
posterior means and standard deviations for the category threshold estimates are presented 
in Table [TJ 

The category thresholds estimates ranged from —0.68 to 3.32. Similar to the conclusions 
in De Boeck and Wilson (2004), Item 21 was the most difficult (i.e., the largest estimates 
for the category thresholds), which suggest that examinees require a higher level of verbal 
aggression to endorse the higher rating categories for this item. Item 4 was the easiest (i.e., 
the smallest estimates for the category thresholds), which suggest that examinees require 
lower levels of verbal aggression to endorse the higher rating categories for this item. 

A unique feature of the DDP-RM is the information it provides in the posterior predictive 
density about how examinees used the rating categories. That is, were different category 
threshold levels of clusters of examinees present for an item? We display in Figure 3 for three 
items. Notice that Items 1 and 23 exhibit greater variability in their densities compared to 
Item 2. Moreover, the density for Item l's first threshold has a tri-modal form (i.e., one 
major and two minor), suggesting three clusters of examinees with different levels of category 
thresholds are present. The density for the second threshold for this item is bimodal. The 
densities for Item 23's first and second thresholds have a second minor mode, though the 
second mode is larger for the second threshold. The densities for Item 2, on the other hand, 
have only a single mode, suggesting that a single cluster of examines exists with respect to 
category threshold estimates. Moreover, notice that the variability for this item is much 
smaller. Thus, a single set of category threshold estimates is much more appropriate to 
represent all examinees compared to the other Items 1 and 23. Most of the item's posterior 
predictive densities consisted of a single mode. For the values of the model for each threshold 
by item, please refer to Table [TJ Traditional models do not provide such information. With 
traditional models, one only has the threshold estimates to compare across items, which 
could lead to misleading conclusions, as might be the case in this situation. 

Through the neighborhood location and size parameters (i.e., 7 and if), respectively), 
the DDP-RM also provides information about the similarities in mixing distributions across 
items. Neighborhood location estimates ranged from 6.0 to 255.6. The neighborhood size es- 




8=1 



generated by the MCMC algorithm, univariate trace plots can be used to evaluate 



9 



timates ranged from 7.5 to 19.8. None of the items have the same neighborhood location and 
size estimates (i.e., 7. 7^ 7 -/ and ^ 7^ ^ •/ for two different js). Thus, none of the items share 
a common mixing distribution. For all items, the 95% MC confidence interval half-width 
ranged from .02 to .93 for the posterior mean and .01 to .79 for the posterior standard devi- 
ation for the neighborhood location 7. The 95% MC confidence interval half-width ranged 
from .01 to .93 for the posterior mean and .01 to .79 for the posterior standard deviation for 
the neighborhood size tp. For the median and quartile range for the neighborhood location 
and size, please refer to Figure 4. 

As with other common IRT models for rating data, the DDP-RM provides posterior 
means of examine abilities, which represent the examinees' level on the latent trait scale. The 
examinee ability estimates ranged from —2.37 to 3.74, with a mean of —0.02 and standard 
deviation of 1.01. The 95% MC confidence interval half-width ranged from .01 to .03 for the 
mean of the posterior ability estimates, and .00 to .03 for the for the standard deviation of 
the posterior ability estimates. 

3.2 Model Comparisons 

In this study, the comparison models were the partial credit model (PCM) (Masters, 1982), 
generalized partial credit model (GPCM) (Muraki, 1992), rating scale model (RSM) (An- 
drich, 1978), graded response model (GRM) (Samejima, 1969), nominal response model 
(NRM) (Bock, 1972), mix partial credit model (mix-PCM) (Rost, 1991), and a covariate- 
independent DP mixture PCM model that treated the category thresholds as random. All 
models except the latter two were fit using IRTPRO 2.1 (Cai, Thissen, & du Toit, 2011). The 
mix-PCM was fit in WINMIRA 2001 (von Davier, 2001), and a 3-mixture PCM provided 
a best fit according to the AIC model selection criterion (Akaike, 1973). Based on prelim- 
inary analyses, a three-mixture PCM achieved the best predictive performance compared 
to a one-, two-, four-, and five-mixture PCM. Thus, we report the predictive performance 
of the three-mixture PCM. The DP mixture PCM model was fit in MATLAB (2012, The 
MathWorks, Natick, MA). The baseline distribution for the set of m thresholds was dis- 
tributed as a multivariate normal distribution with density n(0, I m ); the examinee abilities 
were assumed to follow a univariate normal with density n(0, 1); and the precision parameter 
a was fixed to 1. 

For the DDP-RM and the DP mixture PCM, the 95% Monte Carlo Confidence interval 
half width were generally less than 1. Moreover, for each -Di(m), GF(m), and Pen(m), there 
was no overlap between two models, after accounting for the 95% MCCI. Table [2] contains 
the Di(m) for all models included in the analysis of the verbal aggression data set. The 
DDP-RM outperformed all comparison models by at least 49 Di(m) units. In all, the three 
mixture models outperformed the traditional, single-mixture models, which suggests that 
more than one latent class is present in the data set. The finite-mixture Rasch PCM model, 
while outperformed the single-mixture models, was still is bested by the two infinite-mixture 
models. The DDP-RM outperforming the DP-mixture PCM suggests that all items do not 
share a common mixing distribution. 



10 



4 Conclusions 



We have introduced a novel Bayesian nonparametric rating scale IRT model, named the 
DDP-RM, which is an infinite-mixture model that is based on the local Dirichlet process 
formulation of the DDP. The model, through the posterior predictive distribution, describes 
how the examinees are using the rating categories. Specifically, it can reveal the number of 
possible groups of examinees that may be present for a given item threshold based on the 
number of modes displayed in the distribution. Moreover, we demonstrated that the new 
model can provide a substantially-better predictive fit of the rating data, compared to other 
IRT models of common usage. 

In future research, it would be of interest to extend the DDP-RM, to have (infinitely- 
many) mixture weights that are more flexible than the stick-breaking weights of the DDP. 
For example, Karabatsos and Walker (2012) proposed novel mixture weights that are based 
on an infinite-ordered probits regression model with covariate dependence in the mean and 
variance. Alternatively, the (infinitely-many) mixture weights can be specified by a covariate- 
dependent version of normalized random measures (Regazzini, Lijoi & Priinster, 2003; Lijoi, 
Mena, & Priinster, 2005, 2007; James, Lijoi, & Priinster, 2009). 



11 



References 



Akaike, H. (1973). Information theory and an extension of the maximum likelihood princi- 
ple. In International Symposium on Information Theory, 2nd Tsahkadsor, Armenian SSR 
(pp. 267-281). 

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 
43, 561-573. 

Cai, L., du Toit, S., & Thissen, D. (2011). IRTPRO: Flexible, multidimensional, multi- 
ple categorical IRT modeling. Chicago, IL: Scientific Software International. 

Chung, Y. & Dunson, D. (2011). The local Dirichlet process. Annals of the Institute of 
Statistical Mathematics, 63, 59-80. 

De Boeck, P. & Wilson, M. (2004). Explanatory item response models: A generalized 
linear and nonlinear approach. Springer. 

Duncan, K. & MacEachern, S. (2008). Nonparametric Bayesian modelling for item response. 
Statistical Modelling, 8, 41-66. 

Escobar, M. & West, M. (1995). Bayesian density estimation and inference using mix- 
tures. Journal of the American Statistical Association, 90, 577-588. 

Ferguson, T. (1973). A Bayesian analysis of some nonparametric problems. Annals of 
Statistics, 1, 209-230. 

Frick, H., Strobl, C, Leisch, F., & Zeileis, A. (2012). Flexible Rasch mixture models with 
package psychomix. Journal of Statistical Software, 48, 1-25. 

Gelfand, A. & Ghosh, S. (1998). Model choice: A minimum posterior predictive loss ap- 
proach. Biometrika, 85, 1-11. 

James, L., Lijoi, A., & Priinster, I. (2009). Posterior analysis for normalized random mea- 
sures with independent increments. Scandinavian Journal of Statistics, 36, 76-97. 

Jones, G., Haran, M., Caffo, B., & Neath, R. (2006). Fixed-width output analysis for Markov 
chain Monte Carlo. Journal of the American Statistical Association, 101, 1537-1547. 

Kalli, M., Griffin, J., & Walker, S. (2011). Slice sampling mixture models. Statistics and 
Computing, 21, 93-105. 

Karabatsos, G. & Walker, S. (2012). Adaptive-modal Bayesian nonparametric regression. 
Electronic Journal of Statistics, 6, 2038-2068. 



12 



Karabatsos, G. & Walker, S. (2012 to appear). Bayesian nonparametric IRT. In W. van 
der Linden & R. Hambleton (Eds.), Handbook of Item Response Theory: Models, Statistical 
Tools, and Applications. New York: Taylor and Francis. 

Lijoi, A., Mena, R., & Priinster, I. (2005). Hierarchical mixture modeling with normalized 
inverse-Gaussian priors. Journal of the American Statistical Association, 100, 1278-1291. 

Lijoi, A., Mena, R., & Priinster, I. (2007). Controlling the reinforcement in Bayesian non- 
parametric mixture models. Journal of the Royal Statistical Society, Series B, 69, 715-740. 

Lo, A. (1984). On a class of Bayesian nonparametric estimates. Annals of Statistics, 12, 
351-357. 

MacEachern, S. (1999). Dependent nonparametric processes. Proceedings of the Bayesian 
Statistical Sciences Section of the American Statistical Association (pp. 50-55). 

MacEachern, S. (2000). Dependent Dirichlet Processes. Technical report, Department of 
Statistics, The Ohio State University. 

MacEachern, S. (2001). Decision theoretic aspects of dependent nonparametric processes. In 
E. George (Ed.), Bayesian Methods with Applications to Science, Policy and Official Statis- 
tics (pp. 551-560). Creta: International Society for Bayesian Analysis. 

Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 149- 
174. 

McLachlan, G. & Peel, D. (2000). Finite mixture models. Wiley- Interscience. 

Miyazaki, K. & Hoshino, T. (2009). A Bayesian semiparametric item response model with 
Dirichlet process priors. Psychometrika, 74, 375-393. 

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. 

Applied Psychological Measurement, 16, 159-176. 

Regazzini, E., Lijoi, A., & Priinster, I. (2003). Distributional results for means of nor- 
malized random measures with independent increments. Annals of Statistics, 31, 560-585. 

Roberts, G. & Rosenthal, J. (2009). Examples of adaptive MCMC. Journal of Compu- 
tational and Graphical Statistics, 18, 349-367. 

Rost, J. (1991). A logistic mixture distribution model for polytomous item responses. Jour- 
nal of Mathematical and Statistical Psychology, 44-, 75-92. 

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. 
Psychometrika Monograph Supplement. 



13 



Samejima, F. (1972). A general model for free response data. Psychometrika Monograph, 18. 

San Martin, E., Jara, A., Rolin, J., & Mouchart, M. (2011). On the Bayesian nonpara- 
metric generalization of IRT-type models. Psychometrika, 76, 385-409. 

Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 
639-650. 

Smit, A., Kelderman, H., & van der Flier, H. (2003). Latent trait latent class analysis 
of an Eysenck Personality Questionnaire. Methods of Psychological Research Online, 8, 23- 
50. 

von Davier, M. (2001). WINMIRA 2001. [Computer software]. St. Paul, MN: Assess- 
ment Systems Corporation. 

von Davier, M. & Yamamoto, K. (2004). Partially observed mixtures of IRT models: An 
extension of the generalized partial-credit model. Applied Psychological Measurement, 28, 
389-406. 



14 



APPENDIX A: MCMC Sampling Methods 



We implement the MCMC sampling method of Kalli et al., (2011) to estimate our infinite- 
mixture IRT model. This MCMC sampling method involves introducing strategic latent 
variables in order to implement exact MCMC algorithms for the estimation of the model's 
posterior distribution. That is, for our DDP-RM (Section [2]), we introduce the latent vari- 
ables (ui, Zi € Z)™ =1 and a decreasing function £ h = exp(-h), so that the model's data 
likelihood can be written as the joint distribution: 

n n 

II /K = II i 1 (° < u * < O C 1 / {vWmi T ^ w * ( x ^)} > ( 6 ) 

i=i i=i 

where 6 l t (j) denotes the ability of examinee t who corresponds to rating jji, and where I (•) is 
the indicator function. Marginalizing over each of the latent variables (itj, zi) in Equation EJ 
for each % — 1, ...n, returns the original likelihood, 

n f oo 

n i s ^ (#*«> r j ^ ( xT ^) ' 

i=l I h=l 

of our infinite-dimensional IRT model. Thus, provided the latent variables, the model can 
be characterized as a finite-dimensional model, which in turn, permits the use of standard 
MCMC methods to sample the model's full joint posterior distribution. Given all variables, 
save the (z;)™ =1 , the choice of each Zi has minimum 1 and maximum iV max , where iV max = 
max, [max/! I (m < £ h ) h]. 

Specifically, for each i = 1, n and t = 1, T, each of the model parameters is sampled 
from its corresponding full conditional posterior distribution, at each stage s (s = 1,...,S) 
of the MCMC algorithm. We assume the prior form as in the empirical illustration of our 
model, in the analysis of the verbal aggression data set, as in Section 3. The full conditional 
posterior distribution for each block of model parameters are as follows: 

1. it(ui\...) = un(w;|0,£ 2 J ; 

2. it (zi = h\...) ocI(ui< £ h ) fj" 1 / (yi\9 t (i),T h ) u h (x'7) ,h=l, N, 

3. 7r(e t \...) ^n(e\0A)u im=t f (yi\0m,T Zi ) ; 

4. 7T (7|...) oc un p (7|a 7 , 6 7 ) U7=i v z t lii=x i 1 ~ v i) 5 

5. tt(^(x)|...) oc un(^(x)|a v ,,6^)nr=i^n^ 1 ( 1 -^); 

6. 7T (r h \...) oc n m (r h \0, E T ) U ieh f (yi\9 t{i)} T Zt ) ,h = l, N, 



maxi 



" maxi 



7. 7r (vh\—) = beta! Vh 



1 + = hhzi^ max{£ x }), a + £ I ( Zi > h) , h = 1, N n 

i=i i=i J 

8. 7r (a\...) = ga(a|a Q + n c i us — I(u > {0/(1 + 0)}), b a — \og(r))), given draws r\ ~ Beta(a + 
1, n), u ~ Uniform(0, 1) , and O = (a a + n dus — l)/({b a — log(^)}n), where n c \ us is the 
number of unique z iy over (z = 1, . . . ,n) (Escobar & West, 1995, p. 584). 



15 



Standard MCMC Gibbs sampling methods can be used to sample the full conditionals 
in Steps 1, 2, 7, and 8. The full conditionals in Steps 3 through 6 are each sampled us- 
ing an adaptive random- walk Metropolis-Hastings algorithm (Roberts & Rosenthal, 2009). 
The above 8-step sampling algorithm is repeated a large number S of times to construct 

a discrete-time Harris ergodic Markov chain = (0> T i v i a ? 7) '*/') (s ' > j , having a pos- 

terior distribution H(£\V n ) as its stationary distribution, provided that a proper prior is 
assigned to £. We have written MATLAB (2012, The MathWorks, Natick, MA) code that 
implements the described MCMC sampling algorithm. 



16 



APPENDIX B: List of Verbal Aggression Items 

For the Verbal Aggression Questionnaire, the 24 items are as follows, with each item 
scored on a 0-2 rating scale. 

1. A bus fails to stop for me. I would want to curse. 

2. A bus fails to stop for me. I would want to scold. 

3. A bus fails to stop for me. I would want to shout. 

4. I miss a train because a clerk gave me faulty information. I would want to curse. 

5. I miss a train because a clerk gave me faulty information. I would want to scold. 

6. I miss a train because a clerk gave me faulty I would want to shout. 

7. The grocery store closes just as I am about to enter. I would want to curse. 

8. The grocery store closes just as I am about to enter. I would want to scold. 

9. The grocery store closes just as I am about to enter. I would want to shout. 

10. The operator disconnects me when I had used up my last 10 cents for a call. I would 
want to curse. 

11. The operator disconnects me when I had used up my last 10 cents for a call. I would 
want to scold. 

12. The operator disconnects me when I had used up my last 10 cents for a call. I would 
want to shout. 

13. A bus fails to stop for me. I would curse. 

14. A bus fails to stop for me. I would shout. 

15. A bus fails to stop for me. I would scold. 

16. I miss a train because a clerk gave me faulty information. I would curse. 

17. I miss a train because a clerk gave me faulty information. I would scold. 

18. I miss a train because a clerk gave me faulty information. I would shout. 

19. The grocery store closes just as I am about to enter. I would curse. 

20. The grocery store closes just as I am about to enter. I would scold. 

21. The grocery store closes just as I am about to enter. I would shout. 

22. The operator disconnects me when I had used up my last 10 cents for a call. I would 
curse. 

17 



23. The operator disconnects me when I had used up my last 10 cents for a call. I would 
scold. 

24. The operator disconnects me when I had used up my last 10 cents for a call. I would 
shout. 

Each of the 24 items above is indicated as either a curse, scold, or shout item. Also, 
items 1-6 and items 13-18 provide the Other-to-Blame items. Items 7-12 and items 19-24 
provide the Self to Blame items. Additionally, items 1-12 are Behavior Mode: Want 
items and items 13-24 are Behavior Mode: Do items. 



18 



Tl T 2 



Item 


Mean 


SD 


Mean 


SD 


Modes Ti 


Modes r 2 


1 


-.42 


1.27 


-.03 


1.87 


-.05, -.91, 1.40 


.56, -.54 


2 


.06 


.83 


.20 


.85 


.05 


.22 


3 


.28 


.85 


1.09 


1.00 


.43 


1.36 


4 


-.68 


1.47 


.09 


1.55 


-.30, -.90 


.56, -1.65, -2.31, -.50, 1. 


5 


-.10 


.25 


.25 


.26 


-.16 


.19 


6 


.33 


1.74 


.67 


1.21 


-.19 


.56 


7 


-.14 


.87 


1.11 


1.43 


-.40 


1.51 


8 


.82 


.29 


2.01 


.42 


.84 


2.04 


9 


1.52 


.52 


2.75 


.70 


1.60 


2.78,3.82 


10 


-.63 


.52 


.70 


.56 


-.77 


.67 


11 


.63 


.47 


1.29 


.59 


.71 


1.36 


12 


1.28 


1.05 


1.70 


1.16 


1.64 


2.02 


13 


-.61 


.46 


.21 


.47 


-.63 


.24 


14 


.14 


.72 


.63 


1.2 


-.06 


.84 


15 


1.15 


.86 


1.69 


1.58 


1.38, .22 


2.23 


16 


-.25 


.92 


.20 


1.24 


-.46 


.33 


17 


.48 


.77 


1.04 


1.35 


.46 


1.29 


18 


1.62 


1.00 


2.17 


1.2 


1.94 


2.47 


19 


.89 


.64 


2.12 


.93 


1.02 


2.25 


20 


.96 


.38 


2.24 


.56 


1.10 


2.21 


21 


2.87 


.52 


3.31 


.77 


2.92 


3.22 


22 


-.22 


.86 


.80 


1.34 


-.48 


1.06 


23 


.61 


1.83 


1.01 


1.27 


-.15, 2.67 


.64,1.05 


24 


2.06 


.07 


2.56 


.89 


2.17 


2.45 



Table 1: For the DDP-RM, the posterior estimates of the ordered category threshold param- 
eters, by item. For the posterior mean and SD estimates, the 95 percent MCCI half-width 
typically ranged between .00 to 03, with maximum .05. 



19 



Model (m) 


£>i(m) 


GF(to) 


Pen(m) 


DDP-RM 


4984 


2008 


2976 


DP-PCM 


5033 


2077 


2956 


3-Mixture PCM 


5163 


2485 


2679 


PCM 


5716 


2783 


2934 


GPCM 


5686 


2774 


2912 


RSM 


5726 


2791 


2936 


NRM 


5689 


2774 


2915 


GRM 


5709 


2783 


2925 



Table 2: The overall mean-squared predictive error, the goodness of fit, and penalty, by 
model. 



20 



Figure Captions 

Figure 1. Traceplots of the MCMC posterior samples of the threshold estimates for three 
items. 

Figure 2. Traceplots of the MCMC posterior samples of the ability estimates for six 
examinees. 

Figure 3. The posterior predictive density of the rating category thresholds for three 
items. 

Figure 4. Median, interquartile, and 95-percentile range of the posterior distribution for 
the neighborhood location (7) and size (ip) by item. 



21 



Examinee 67 Examinee 82 




MCMC Stage x1Q 4 MCMC Stage x1Q 4 



Estimate 



orva-i^cncooro-i^cncoo 

1 1 1 1 1 1 1 1 1 1 1 



3>i 



~©~H 



© | 1 



© 



] i 



H 


© 1- 




H 


© I 



© 



"©1 



©I>H 



_£ 



® h 



~©~H 
"©1 



~©~> l 



© 



— — — — — I 



© 



=1 < 

I I © h "I 

I ] © h H 

i r~©~h ' 

j i i i i i i i i i 



Estimate 





Ol o 

o o o 


o 


O 

o 


Ol 

o 


1 


I I I 


I 


I 




2 










3 










4 








tr 


5 


□ 








6 










7 


TT 








8 










9 










10 










11 










12 










13 


<S>H 

C__l 








14 










15 


T 








16 











17 











18 










19 


□ 








20 










21 


^©H 








22 










23 










24 


i i i 


I 


I 


i 



